Information

This R Markdown document was created as part of an individual assignment for SMM634 at Bayes Business School, City St George's, University of London in Term 1 2025-26.

Bonus Interactive Dashboard Available

This static report is accompanied by a live R Shiny dashboard.

Launch interactive dashboard to explore the models described below!

1 Introduction

Healthcare expenditure represents a substantial share of economic activity in many high-income countries. In the United States, national health spending is projected to reach 20.3% of Gross Domestic Product (GDP) by 2033, up from 17.6% in 2023 (Keehan et al. 2025). Accurately modelling healthcare utilisation and expenditure is essential for forecasting budgetary pressures, planning service capacity, and designing policies that can meet the needs of an ageing population in an efficient and sustainable way.

The Agency for Healthcare Research and Quality (AHRQ) publishes the Medical Expenditure Panel Survey (MEPS), which provides detailed information on healthcare utilisation, associated expenditures, insurance coverage, and socio-demographic characteristics, all standardised to represent a full calendar year for each respondent (Agency for Healthcare Research and Quality 2023). These data enable one to quantify how healthcare use and costs vary across population groups and to identify factors associated with higher or lower spending, which is valuable for targeting interventions and evaluating potential policy reforms.

From a modelling perspective, healthcare expenditure data pose several distributional challenges - they are non-negative, highly right-skewed (a few patients incur massive costs), and a large proportion of patients have no recorded healthcare usage. Understanding both whether individuals access healthcare and, conditional on doing so, how much is spent is crucial for characterising demand and the resulting financial burden. This report uses 2012 MEPS data to investigate two primary outcomes related to physician services: the number of doctor consultations (dvisit), capturing healthcare utilisation, and the annual expenditures on doctor visits (dvexpend), capturing the associated financial cost.

2 Methods

The 2012 MEPS dataset contains 10,638 observations on US adults aged 18–65 years. Tables 2.1 & 2.2 show the categorical and quantitative variables for both the full sample and a subsample of individuals with positive expenditure.

The covariates include demographic characteristics (age, gender, ethnicity, region, and education), socio-economic status (income), health indicators (BMI, self-reported general and mental health, hypertension, and hyperlipidemia). The number of non-physician visits (ndvisit) was retained as a proxy for the latent propensity to seek care.

Table 2.1: Frequency of categorical variables in the full sample of patients and the subset of patients with positive healthcare expenditure.
Variable Category
gender Female 0.53 0.62
Male 0.47 0.38
ethnicity White 0.70 0.71
Black 0.21 0.20
Native American 0.01 0.01
Others 0.09 0.08
education_cat_detailed Less than High School 0.22 0.18
High School Graduate 0.31 0.29
Some College 0.24 0.25
College Graduate 0.15 0.17
Post-Graduate 0.09 0.11
region Northeast 0.15 0.16
Midwest 0.19 0.21
South 0.39 0.38
West 0.26 0.24
hypertension No 0.75 0.65
Yes 0.25 0.35
hyperlipidemia No 0.77 0.68
Yes 0.23 0.32
Table 2.2: Summary statistics, mean (standard deviation), for continuous variables in the full sample and among individuals with positive healthcare expenditure.
Variable Description
bmi Body mass index 28.03 (6.41) 28.68 (6.79)
age Age (years) 40.25 (13.66) 43.32 (13.55)
education Education (no. of years) 12.77 (2.90) 13.13 (2.81)
income Income (USD) 60,817 (51,451) 65,688 (54,115)
dvisit Doctor visits 2.13 (3.63) 3.94 (4.14)
ndvisit Non-doctor visits 0.94 (2.91) 1.51 (3.62)
dvexpend Doctor expenditure (USD) 481.07 (1,646.79) 889.71 (2,156.92)
ndvexpend Non-doctor expenditure (USD) 159.59 (790.31) 266.51 (1,038.92)

2.1 Health-care expenditure

Doctor-visit expenditure (dvexpend) is semi-continuous, characterised by a point mass at zero (45.9% of observations) and a heavy right skew among positive values. To address this, a two-part model was used:

  1. Extensive margin: A Probit model estimates the probability of incurring any expenditure
  2. Intensive margin: A Generalised Linear Model (GLM) with a Gamma distribution and log link models the magnitude of costs, conditional on them being positive. This approach avoids the retransformation bias present in log-normal models when dealing with heteroscedastic residuals (Blough, Madden, and Hornbrook 1999; Deb and Norton 2018)

For both parts, the choice of covariates was informed by a preliminary Lasso regression using the full set of candidate predictors, combined with prior domain knowledge. ndvisit was retained a priori in both parts as a proxy for latent propensity to use healthcare services. The final two-part model is shown in Equations (2.1) & (2.2). \[\begin{align} % Part 1: Probit \text{Part 1: }& \nonumber\\ \mathbb{P}[\text{dvexpend}_i > 0] &= \Phi(\eta_{1i}) \nonumber \\ \eta_{1i} &= \gamma_0 + \gamma_1 \text{age}_i + \gamma_2 \mathrm{I}(\text{gender}_i) + \gamma_3 \text{bmi}_i \nonumber \\ &\quad + \sum_{k=2}^{4} \gamma_{4k} \mathrm{I}(\text{ethnicity}_i = k) + \sum_{k=2}^{4} \gamma_{5k} \mathrm{I}(\text{region}_i = k) \nonumber \\ &\quad + \sum_{k=2}^{5} \gamma_{6k} \mathrm{I}(\text{education\_cat\_detailed}_i = k) \nonumber \\ &\quad + \sum_{k=2}^{5} \gamma_{7k} \mathrm{I}(\text{general}_i = k) + \sum_{k=2}^{5} \gamma_{8k} \mathrm{I}(\text{mental}_i = k) \nonumber \\ &\quad + \gamma_9 \mathrm{I}(\text{hypertension}_i) + \gamma_{10} \mathrm{I}(\text{hyperlipidemia}_i) \nonumber \\ &\quad + \gamma_{11} \text{income}_i + \gamma_{12} \text{ndvisit}_i \tag{2.1} \\[10pt] % Part 2: Gamma \text{Part 2: }& \nonumber\\ \mathbb{E}[\text{dvexpend}_i &\mid \text{dvexpend}_i > 0] = \mu_i \nonumber \\ \ln(\mu_i) &= \beta_0 + \beta_1 \text{age}_i + \beta_2 \mathrm{I}(\text{gender}_i) \nonumber \\ &\quad + \sum_{k=2}^{4} \beta_{3k} \mathrm{I}(\text{region}_i = k) + \sum_{k=2}^{5} \beta_{4k} \mathrm{I}(\text{education\_cat\_detailed}_i = k) \nonumber \\ &\quad + \sum_{k=2}^{5} \beta_{5k} \mathrm{I}(\text{general}_i = k) + \sum_{k=2}^{5} \beta_{6k} \mathrm{I}(\text{mental}_i = k) \nonumber \\ &\quad + \beta_7 \mathrm{I}(\text{hypertension}_i) + \beta_8 \mathrm{I}(\text{hyperlipidemia}_i) \nonumber \\ &\quad + \beta_9 \text{income}_i + \beta_{10} \text{ndvisit}_i \tag{2.2} \end{align}\] Where \(\eta_{1i}\) is the linear predictor for the Probit model, \(\gamma_j\) are the coefficients for the Probit, \(\beta_j\) are the coefficients for the Gamma GLM with a log link, and \(\mathrm{I}(\cdot)\) is the indicator function, equal to 1 if the condition is met and 0 otherwise. For categorical variables (ethnicity, region, education_cat_detailed general and mental health), the summation \(\sum\) indicates that separate coefficients are estimated for each level \(k\) relative to the reference category (\(k=1\)).

2.2 Health-care utilisation

The count of doctor visits (dvisit) exhibited significant overdispersion (Variance: 13.1 > Mean: 2.1), violating the assumptions of a standard Poisson model. A Negative Binomial regression was fitted to account for the excess variability and lasso screening and prior domain knowledge were used to inform covariate choice (Equation (2.3)).

A Poisson regression model with the same covariates as the final specification was also fitted and a formal overdispersion test, comparing the Poisson residual deviance to a \(\chi^2\) distribution with the corresponding residual degrees of freedom, yielded a Pearson dispersion of 4.18 (p < 0.001). The Poisson model had a higher Akaike Information Criterion (AIC) than the Negative Binomial model (AIC\(_\text{Poisson}\) = 50386 versus AIC\(_\text{NegBin}\) = 37334), further supporting the justification for a Negative Binomial model.

\[\begin{align} % Count model: Negative Binomial \text{Count model: } \text{dvisit}_i &\sim \text{NegBin}(\mu_i, \kappa) \nonumber \\[4pt] \log(\mu_i) &= \beta_0 + \beta_1 \text{age}_i + \beta_2 \mathrm{I}(\text{gender}_i) + \beta_3 \text{bmi}_i \nonumber \\ &\quad + \sum_{k=2}^{5} \beta_{4k} \mathrm{I}(\text{general}_i = k) + \sum_{k=2}^{5} \beta_{5k} \mathrm{I}(\text{mental}_i = k) \nonumber \\ &\quad + \sum_{k=2}^{4} \beta_{6k} \mathrm{I}(\text{ethnicity}_i = k) + \sum_{k=2}^{4} \beta_{7k} \mathrm{I}(\text{region}_i = k) \nonumber \\ &\quad + \beta_8 \mathrm{I}(\text{hypertension}_i) + \beta_9 \mathrm{I}(\text{hyperlipidemia}_i) \nonumber \\ &\quad + \beta_{10} \text{income}_i + \beta_{11} \text{ndvisit}_i \nonumber \\ &\quad + \sum_{k=2}^{5} \beta_{12k} \mathrm{I}(\text{education\_cat\_detailed}_i = k) \tag{2.3} \end{align}\]

To investigate the dependence structure between the frequency of visits and the intensity of expenditure beyond observed covariates, a copula approach was employed following the framework of Marra and Radice (2025b). Using Sklar’s Theorem, the joint cumulative distribution function, \(H(y_1,y_2)\), of utilisation (\(Y_1\)) and expenditure (\(Y_2\)) can be modelled by coupling their marginal distributions (\(F_1\) and \(F_2\)) via a copula function \(C\): \[ H(y_1,y_2) = C\big(F_1(y_1), F_2(y_2); \theta\big) \] where \(\theta\) is the association parameter. A Gaussian copula was fitted to the residuals of the marginal models for individuals with positive expenditure using the GJRM package in R (Marra and Radice 2025a). Kendall’s \(\tau\), a measure of rank correlation, was estimated. For a Gaussian copula \(\tau\) is related to the latent correlation, \(\theta\), by \[ \tau = \frac{2}{\pi}\arcsin{(\theta)} \quad \rightarrow \theta = \sin{\left( \frac{\pi\tau}{2} \right)} \]

3 Results

Both the Gamma and Negative Binomial models were fitted with a log link, so their coefficients are interpreted multiplicatively. For a predictor \(x_k\) with coefficient \(\beta_k\), a \(\Delta\)-unit increase in \(x_k\) multiplies the mean (conditional mean cost in the Gamma model and expected visit count in the Negative Binomial model) by \[ \exp(\Delta \beta_k). \] and the percent change is given by \[ 100\left(\exp(\Delta \beta_k)-1\right)\%. \]

3.1 Model 1: Health care expenditure

The two-part model effectively captures the dual processes of access and cost. In the first stage (Probit), females, older adults, and those with chronic conditions (hypertension, hyperlipidemia) were significantly more likely to incur expenditure (Table 3.1). In the second stage (Gamma GLM), conditional on seeking care, significant disparities in expenditure emerged. A 10-year increase in age is associated with a 9% increase in conditional costs. Interestingly, males incur 17.7% lower costs than females (p < 0.001). This finding aligns with work by Bertakis et al. (2000) who suggests women have higher healthcare engagement and diagnostic usage during reproductive years and for preventive screenings.

Self-reported general and mental health showed a strong effect with individuals who report “Poor” health incurring 197% higher costs than those in “Excellent” health. These patterns are consistent with clinical expectations - individuals who are sicker and/or more health-engaged both enter the system more often and incur higher costs once they do so.

The two-part model achieves an Root Mean Square Error of $1599 and a Mean Absolute Error of $553. The predictive performance was robust at the aggregate level, with the total predicted cost within 1.8% of the actual total, although the model underestimates extreme outliers (Figure 3.1).

Distribution of prediction errors (actual cost − predicted cost) by expenditure category. The boxplots display the median and interquartile range of errors for patients grouped by their actual healthcare expenditure and the red dashed line represents a perfect prediction. The large positive errors for the highest cost categories show that the model fails to account for the full magnitude of catastrophic health expenditures.

Figure 3.1: Distribution of prediction errors (actual cost − predicted cost) by expenditure category. The boxplots display the median and interquartile range of errors for patients grouped by their actual healthcare expenditure and the red dashed line represents a perfect prediction. The large positive errors for the highest cost categories show that the model fails to account for the full magnitude of catastrophic health expenditures.

3.2 Model 2: Health care utilisation

As shown in Table 3.2, utilisation patterns mirrored expenditure. Black and “Other” ethnic groups showed significantly lower visit counts compared to White patients (Incident Rate Ratios < 1), consistent with structural barriers to access and utilisation documented in the US healthcare system (Waidmann and Rajan 2000; Macias-Konstantopoulos et al. 2023). Non-physician visits (ndvisit) were positively associated with doctor visits (11% increase per visit), perhaps suggesting complementarity rather than substitution between provider types.

Table 3.1: Estimated coefficients (β), standard errors, and p-values from the two-part model for doctor-visit expenditure (dvexpend), including a Probit model for the probability of any expenditure (Part 1) and a Gamma GLM with log link for positive expenditure (Part 2).
Pt 1: Prob. of any expense
Pt 2: Expenditure amount
Term Estimate Std. Error p-value Estimate Std. Error p-value
(Intercept) -0.780 0.084 <0.001 5.634 0.129 <0.001
age 0.008 0.001 <0.001 0.009 0.002 <0.001
bmi 0.006 0.002 0.009
income 0.000 0.000 <0.001 0.000 0.000 0.002
ndvisit 0.113 0.008 <0.001 0.070 0.007 <0.001
General Health (ref: Excellent)
Poor 0.761 0.111 <0.001 1.090 0.160 <0.001
Fair 0.538 0.063 <0.001 0.598 0.112 <0.001
Good 0.301 0.046 <0.001 0.306 0.089 <0.001
VGood 0.159 0.041 <0.001 0.023 0.083 0.779
Mental Health (ref: Excellent)
Poor 0.156 0.139 0.259 0.262 0.206 0.203
Fair 0.113 0.071 0.114 0.147 0.119 0.214
Good -0.053 0.044 0.223 0.057 0.081 0.482
VGood -0.034 0.039 0.385 0.087 0.075 0.243
Education (ref: Less than High School)
High School Graduate 0.082 0.037 0.027 0.142 0.078 0.069
Some College 0.241 0.040 <0.001 0.251 0.081 0.002
College Graduate 0.358 0.047 <0.001 0.290 0.092 0.002
Post-Graduate 0.397 0.058 <0.001 0.409 0.107 <0.001
Ethnicity (ref: White)
Black -0.054 0.034 0.113
Native American 0.179 0.149 0.232
Others -0.146 0.048 0.002
Gender (ref: Female)
Male -0.525 0.027 <0.001 -0.195 0.054 <0.001
Hyperlipidemia
Yes 0.445 0.037 <0.001 0.138 0.063 0.029
Hypertension
Yes 0.360 0.036 <0.001 0.097 0.063 0.128
Region (ref: Northeast)
Midwest 0.034 0.045 0.455 -0.017 0.084 0.838
South -0.098 0.040 0.013 -0.192 0.077 0.012
West -0.178 0.043 <0.001 0.006 0.082 0.945
Table 3.2: Estimated coefficients (β), standard errors, and p-values from a Negative Binomial regression model for the number of doctor visits (dvisit).
Term Estimate Std. Error p-value
(Intercept) -0.689 0.092 <0.001
age 0.010 0.001 <0.001
bmi 0.008 0.002 <0.001
income 0.000 0.000 <0.001
ndvisit 0.101 0.004 <0.001
General Health (ref: Excellent)
Poor 1.178 0.095 <0.001
Fair 0.812 0.064 <0.001
Good 0.467 0.050 <0.001
VGood 0.234 0.046 <0.001
Mental Health (ref: Excellent)
Poor 0.588 0.119 <0.001
Fair 0.368 0.068 <0.001
Good 0.059 0.046 0.202
VGood 0.061 0.042 0.147
Education (ref: Less than High School)
High School Graduate 0.095 0.041 0.022
Some College 0.286 0.044 <0.001
College Graduate 0.387 0.051 <0.001
Post-Graduate 0.453 0.060 <0.001
Ethnicity (ref: White)
Black -0.101 0.037 0.007
Native American -0.070 0.158 0.656
Others -0.126 0.053 0.018
Gender (ref: Female)
Male -0.629 0.029 <0.001
Hyperlipidemia
Yes 0.369 0.037 <0.001
Hypertension
Yes 0.318 0.037 <0.001
Region (ref: Northeast)
Midwest 0.031 0.047 0.516
South -0.133 0.043 0.002
West -0.193 0.046 <0.001

3.3 Dependence structure

The copula analysis of positive spenders revealed a moderate-to-strong positive dependence between frequency and severity. This implies that patients who have more doctor’s appointments also tend to have higher-than-expected costs per visit, suggesting a compounding resource burden for patients with higher health care demands.

As shown in Figure 3.2, there is a clear positive association between the number of visits and total expenditure. While some mechanical correlation is expected (more visits naturally equal more cost), the estimated Gaussian copula parameter (Kendall’s \(\hat{\tau} =\) 0.522, \(\hat{\theta} =\) 0.731) indicates a dependence that extends beyond simple accumulation. This finding is in agreement with recent work by Marra and Radice (2025b) who used the same MEPS data, and also found that ‘heavy’ patient’s are distinct, not only in their visit frequency but also in their resource consumption intensity. This supports the use of a joint frequency-severity modelling framework over independent models.

Joint distribution of healthcare utilisation and expenditure for patients with at least one doctor's visit. The vertical banding reflects the discrete nature of visit counts and the y-axis shows total doctor-visit expenditure on a logarithmic scale to accommodate skewness. The overlaying positive trend highlights that expenditure increases with visit frequency.

Figure 3.2: Joint distribution of healthcare utilisation and expenditure for patients with at least one doctor’s visit. The vertical banding reflects the discrete nature of visit counts and the y-axis shows total doctor-visit expenditure on a logarithmic scale to accommodate skewness. The overlaying positive trend highlights that expenditure increases with visit frequency.

4 Discussion and conclusion

This analysis highlights the multifaceted drivers of healthcare demand and confirms that the decision to seek care and the resulting cost intensity are driven by distinct, but overlapping factors. Significant gender and health-status gaps reinforce the need for risk-adjustment models that explicitly account for biological and systemic usage differences (Bertakis et al. 2000).

A key strength of this study is the application of a joint frequency-severity framework. The estimated Gaussian copula parameter (\(\hat{\tau}\) = 0.522) aligns with recent work by Marra and Radice (2025b) on the same MEPS dataset. Marra and Radice (2025b), despite using a more advanced Rotated Copula with Gumbel and Dagum models, reported a similar dependence (\(\hat{\theta}\) = 2.67 corresponding to \(\hat{\tau}\) = 0.6251). This report’s replication of this finding using a parsimonious Gamma-Gaussian framework further validates the link between visit frequency and cost intensity as a fundamental structural feature of healthcare demand, rather than a model artefact.

Interpreting these coefficients as causal effects is constrained by observational biases. While the analysis identified ethnic disparities, chiming with the work of Macias-Konstantopoulos et al. (2023) on structural limits to healthcare access, the omission of insurance data acts as a confounding variable. Coefficients for income and ethnicity likely absorb variance related to coverage types, potentially biasing socio-economic conclusions. The positive association with non-doctor visits likely reflects simultaneity bias, where unobserved health shocks drive demand for both physician and non-physician care, rather than a clean substitution effect. In addition, selection bias is introduced by conditioning the Copula only on positive spenders, limiting generalisability to the extensive margin.

From a statistical standpoint, the GLM framework imposes strict linearity, missing non-linear health deterioration thresholds (such as rapid age-BMI interactions). Diagnostic plots also reveal that the Gamma distribution and Gaussian Copula underestimate the extreme right tail and enforce symmetric dependence. Future work may address these catastrophic costs, potentially by using tree-based machine learning to detect non-linearities, or by integrating actuarial Extreme Value Theory (EVT).

The transition from univariate averages to multivariate dependence structures is not just a statistical refinement, but a necessity for financial sustainability. Accurately modelling the compound risk of the high-cost patients through EVT and Copulas will enable health systems to better price the heavy tail of demand, ensuring the long-term viability of the health and insurance ecosystems that support the health of the many.

Appendices

A Model Diagnostics

A.1 Expenditure Model (Gamma GLM)

The Gamma model with log link handles the skewness of positive expenditures. The residual plots in Figure A.1 show minor deviations at the largest fitted values.

Diagnostic plots for the Gamma GLM (Expenditure).

Figure A.1: Diagnostic plots for the Gamma GLM (Expenditure).

  • The Q-Q plot shows points deviating sharply upwards from the diagonal line at the upper quantiles. This confirms the model’s limitation in capturing the most extreme catastrophic costs.
  • The Residuals vs Fitted plot displays a fanning pattern where spread increases with fitted values. This is expected for a Gamma specification, where variance scales with the mean.
  • Influential Observations: Cook’s distance values are below 0.5, indicating that no single high-cost patient is disproportionately driving the model parameters.
Deviance residuals versus fitted values for the Gamma GLM of conditional doctor-visit expenditure. Each point represents a respondent and the dashed line indicates a residual value of zero.

Figure A.2: Deviance residuals versus fitted values for the Gamma GLM of conditional doctor-visit expenditure. Each point represents a respondent and the dashed line indicates a residual value of zero.

A.2 Utiliation model (Negative Binomial)

The Negative Binomial model accounts for overdispersion in visit counts. The diagnostics indicate a reasonable fit for the count data structure.

Diagnostic plots for the Negative Binomial (Utilisation).

Figure A.3: Diagnostic plots for the Negative Binomial (Utilisation).

  • The Q-Q plot follows the reference line closely, including the tails showing the Negative Binomial distribution successfully accounts for the overdispersion in the visit count data.
  • While there are some high-frequency users, Cook’s distance remains low, suggesting the model is robust to these ‘extreme’ users.
Deviance residuals versus fitted values for the Negative Binomial regression of doctor-visit counts.

Figure A.4: Deviance residuals versus fitted values for the Negative Binomial regression of doctor-visit counts.

Deviance residuals for utilisation show no strong systematic pattern, with the residuals centred around the dashed horizontal line at zero, confirming the model structure is appropriate. The plot displays distinct curve-like striations, which reflect the discrete integer nature of the visit count data. Most observations cluster near zero, though some large positive residuals occur at lower fitted values, indicating occasional under-prediction where patients had significantly more visits than the model anticipated. The variance remains relatively stable across the range of fitted values, confirming that the Negative Binomial distribution successfully handles the overdispersion.

B Bonus exploration: R Shiny dashboard

To complement the regression analysis, address space constraints, and in a optimistic pursuit of potential extra credit, an interactive R Shiny web application was developed to allow one to explore the models described in this report further. This dashboard, or web application, allows one to dynamically input specific patient characteristics and observe the resulting predictions from the fitted models. It is accessible via:

Figure B.1: Annotated interface of the MEPS Healthcare Model Dashboard. The interface highlights (1) a button that runs the model for the selected parameters, (2) patient covariate configuration, (3) results navigation, and (4) the predicted expenditure and utilisation outcomes.

Figure B.1 shows an annotated view of the dashboard, including:

  1. An Estimate Outcomes button, which runs the prediction logic using the selected inputs against using the Two-Part and Negative Binomial models.
  2. A sidebar containing interactive sliders and input fields for key covariates, including continuous and categorical variables.
  3. Two tabs allowing the user to toggle between the numerical estimates and a visual comparison of the patient against the population distribution.
  4. Key results cards displaying the calculated Expected Annual Expenditure (derived from the combined Probit and Gamma models) and the Expected Doctor Visits (derived from the Negative Binomial model).

C Reproducibility, accessibility, & word count

C.1 Reproducibility & accessibility

An accessible HTML version of this report is available via a public GitHub page:

This report was created in R Markdown. The source code is open source and available via:

C.2 Word count

The report R Markdown file was used to produce a tex file and the tool texcount was used to count the words within each section. The output of texcount is shown below. Excluding the appendix, Figures and Tables the word count is 1554 (+3.6%).

texcount smm634-individual-assignment.tex 

File: smm634-individual-assignment.tex
Encoding: utf8
Words in text: 2279
Words in headers: 59
Words outside text (captions, etc.): 307
Number of headers: 18
Number of floats/tables/figures: 11
Number of math inlines: 32
Number of math displayed: 6
Subcounts:
  text+headers+captions (#headers/#floats/#inlines/#displayed)
  4+13+0 (1/0/0/0) _top_
  241+1+0 (1/0/0/0) Section: Introduction
  78+1+38 (1/2/0/0) Section: Methods
  204+2+0 (1/0/7/1) Subsection: Health-care expenditure
  218+2+0 (1/0/13/3) Subsection: Health-care utilisation
  55+1+0 (1/0/4/2) Section: Results
  195+5+62 (1/1/0/0) Subsection: Model 1: Health care expenditure
  62+5+57 (1/2/2/0) Subsection: Model 2: Health care utilisation
  139+2+50 (1/1/2/0) Subsection: Dependence structure
  362+3+8 (1/0/4/0) Section: Discussion and conclusion
  0+1+0 (1/0/0/0) Section: Appendices
  0+2+0 (1/0/0/0) Section: Model Diagnostics
  101+4+35 (1/2/0/0) Subsection: Expenditure Model (Gamma GLM)
  158+4+20 (1/2/0/0) Subsection: Utiliation model (Negative Binomial)
  158+5+37 (1/1/0/0) Section: Bonus exploration: R Shiny dashboard
  0+4+0 (1/0/0/0) Section: Reproducibility, accessibility, \& word count
  30+2+0 (1/0/0/0) Subsection: Reproducibility \& accessibility
  268+2+0 (1/0/0/0) Subsection: Word count
...

Agency for Healthcare Research and Quality. 2023. “Medical Expenditure Panel Survey: Download Data Files, Documentation, and Codebooks.” https://meps.ahrq.gov/mepsweb/data_stats/download_data_files.jsp; U.S. Department of Health; Human Services.
Bertakis, K. D., R. Azari, L. J. Helms, E. J. Callahan, and J. A. Robbins. 2000. Gender Differences in the Utilization of Health Care Services. The Journal of Family Practice 49 (2): 147–52.
Blough, David K., Carolyn W. Madden, and Mark C. Hornbrook. 1999. “Modeling Risk Using Generalized Linear Models.” Journal of Health Economics 18 (2): 153–71. https://doi.org/https://doi.org/10.1016/S0167-6296(98)00032-0.
Deb, Partha, and Edward C. Norton. 2018. “Modeling Health Care Expenditures and Use.” Annual Review of Public Health 39 (1): 489–505. https://doi.org/10.1146/annurev-publhealth-040617-013517.
Keehan, Sean P., Andrew J. Madison, John A. Poisal, Gigi A. Cuckler, Sheila D. Smith, Andrea M. Sisko, Jacqueline A. Fiore, and Kathryn E. Rennie. 2025. “National Health Expenditure Projections, 2024-33: Despite Insurance Coverage Declines, Health to Grow as Share of GDP.” Health Affairs 44 (7): 776–87. https://doi.org/10.1377/hlthaff.2025.00545.
Macias-Konstantopoulos, Wendy L., Kimberly A. Collins, Rosemarie Diaz, Herbert C. Duber, Courtney D. Edwards, Anthony P. Hsu, Megan L. Ranney, Ralph J. Riviello, Zachary S. Wettstein, and Carolyn J. Sachs. 2023. “Race, Healthcare, and Health Disparities: A Critical Review and Recommendations for Advancing Health Equity.” Western Journal of Emergency Medicine: Integrating Emergency Care with Population Health 24 (5). https://doi.org/10.5811/westjem.58408.
Marra, Giampiero, and Rosalba Radice. 2025a. Copula Additive Distributional Regression Using R. 1st ed. New York: Chapman; Hall/CRC. https://doi.org/10.1201/9781003593195.
———. 2025b. “Modelling Physician Visit Frequency and Costs Using a Copula Additive Distributional Regression Approach.” Journal of the Royal Statistical Society Series C: Applied Statistics, September, qlaf050. https://doi.org/10.1093/jrsssc/qlaf050.
Waidmann, Timothy A., and Shruti Rajan. 2000. “Race and Ethnic Disparities in Health Care Access and Utilization: An Examination of State Variation.” Medical Care Research and Review 57 (1_suppl): 55–84. https://doi.org/10.1177/1077558700057001S04.

  1. Calculated using \(\tau = 1 - \frac{1}{\theta}\) for the Gumbel Kendall’s tau relationship.↩︎